Text Localization, Enhancement and Binarization in Multimedia Documents
نویسندگان
چکیده
The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e.g. by key word based queries. In this paper we present an algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text. The quality of the localized text is improved by robust multiple frame integration. A new technique for the binarization of the text boxes is proposed. Finally, detection and OCR results for a commercial OCR are presented.
منابع مشابه
A Survey on Various Approaches of Text Extraction in Images
Text Extraction plays a major role in finding vital and valuable information. Text extraction involves detection, localization, tracking, binarization, extraction, enhancement and recognition of the text from the given image. These text characters are difficult to be detected and recognized due to their deviation of size, font, style, orientation, alignment, contrast, complex colored, textured ...
متن کاملAn Enhancement of Images Using Recursive Adaptive Gamma Correction
The “Adaptive Approach for Historical or Degraded Document Binarization” is that in which Libraries and Museums obtain in large gathering of ancient historical documents printed or handwritten in native languages. Typically, only a small group of people are allowed access to such collection, as the preservation of the material is of great concern. In recent years, libraries have begun to digiti...
متن کاملRobust binarization of degraded document images using heuristics
Historically significant documents are often discovered with defects that make them difficult to read and analyze. This fact is particularly troublesome if the defects prevent software from performing an automated analysis. Image enhancement methods are used to remove or minimize document defects, improve software performance, and generally make images more legible. We describe an automated, im...
متن کاملBinarization of Low Quality Text Using a Markov Random Field Model
Binarization techniques have been developed in the document analysis community for over 30 years and many algorithms have been used successfully. On the other hand, document analysis tasks are more and more frequently being applied to multimedia documents such as video sequences. Due to low resolution and lossy compression, the binarization of text included in the frames is a non trivial task. ...
متن کاملPhase-Based Binarization of Ancient Document Images
The main defects present in historical documents are darkness, non-uniform clarification, bleed-through and faded characters. To remove these defects binarization method is used. In this paper a phase based binarization method is studied in which phase of ancient document images is preserved. This method is derived in to three steps: preprocessing, main binarization and post processing. In prep...
متن کامل